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CONFIDENCE  INTERVAL  METHODOLOGY 
FOR  RATIO  MEANS  (CIM4RM) 


1.  INTRODUCTION 

The  U.S.  Army  and  many  other  government  and  private  organizations  need  to  evaluate 
ratio  means  to  help  them  make  informed  life  cyele  management  decisions.  A  ratio  mean  is  the 
ratio  of  the  mean  of  two  random  variables,  X  and  Y,  whose  corresponding  terms  are  paired.  The 
following  chart  depicts  this: 

Table  1.  Definition  of  Ratio  Mean. 


n  j  n 

it  i>- 

_  /  H 
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This  is  the  sample  ratio  mean. 


The  pairs  (Xj,  yj)  arc  assumed  to  be  independent.  The  correlation  between  X  and  Y  may 
be  positive,  negative  or  zero.  X  &  Y  are  random  variables  are  considered  independent  and 
identically  distributed  (i.i.d.),  with  some  distribution  but  unknown. 


For  example,  the  Army  tracks  and  evaluates  the  performance  of  man>  weapon  systems 
using  ratio  mean  metrics,  such  as  the  maintenance  ratio  (MR).  A  MR  estimate 


(  MR  = 


I 


man  -  hours . 


y-l 


n 

'y'  miles  j 

J=  1 


)  is  based  on  a  random  sample  (without  replacement)  of  n  vehicles  from  a 


finite  population,  where  the  pair  (man-hours  and  miles)  are  associated  with  each  vehicle.  This 
example  assumes  there  is  no  variation  from  visit  to  visit  within  each  vehicle.  A  visit  is  defined  to 
be  any  timeline  event  that  requires  repair. 


The  Army  cannot  alw'ays  afford  to  track  every  vehicle  in  its  inventory.  Therefore,  ratio 
mean  performance  metrics  are  tracked  using  a  sample  of  vehicles  over  a  given  time  period.  The 
Army  has  multiple  goals  or  objectives.  One  of  them  is  to  decide  whether  maintenance 
augmentation  is  necessary  for  a  fleet  of  vehicles  before  a  mission.  Another  goal  might  be  to 
determine  if  a  fleet  of  vehicles  has  achieved  acquisition  thresholds  and  targets.  These  goals  and 
objectives  are  based  on  inferences  from  the  sample  using  approximate  confidence  intervals  (Cl) 
for  ratio  means.  This  paper  discusses  and  develops  the  methodology  that  produces  these 


confidence  intervals  (Confidence  Interval  Methodology  for  Ratio  Means  -  CIM4RM).  The  MR 
will  be  used  to  develop  the  methodology. 

CIM4RM  is  the  combined  effort  of  an  existing  tool  (bootstrap-t  approach  [1  ]  with  no 
parametric  assumptions  on  the  distributions)  and  creativity  to  compute  approximate  confidence 
intervals  for  a  ratio  mean  metric.  The  bootstrap-t  approach  is  very  applicable  to  a  location 
statistic  such  as  the  ratio  mean  [2]. 

It  is  known  that  the  bootstrap-t  requires  the  estimation  of  three  parameters  based  on  a 
random  sample  from  a  population.  They  are:  mean,  standard  error  (SE).  and  bootstrap 
standardized  Z  distribution. 

It  is  known  that  the  SE  estimate  for  a  ratio  mean  is  [3]: 

A-  f)  j  A  2 

sqrt{  -  '  [s^jies+MR  *  sman-hours  _  2  *  MR*  covariance  (man  -  hours, miles)] } 

x  n  *  man  -  hours _  ^  ]  j 

•Jn 

Where  n  is  the  sample  size  and  f  is  the  sampling  fraction  (n/N). 

Smiies  &  si man-hours  are  sample  variances  of  miles  and  man-hours,  respectively. 

This  SE  estimate  in  equation  (1)  is  only  dependable  for  sample  sizes  greater  than  30 
where  the  coefficient  of  variation  for  both  variables  are  less  than  10%  [3].  Therefore,  another 
approach  is  needed  to  estimate  these  three  parameters  for  smaller  samples  and  larger  samples 
with  high  variation.  The  approach  that  accomplishes  this  is  CIM4RM. 
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2. 


DEVELOPMENT  OF  METHODOLOGY  (CIM4RM) 


Mean  and  SE  estimates  for  ratio  mean 


Let’s  take  a  random  sample  (without  replacement)  of  n  vehicles  (man-hours  and  miles  for 
each  vehicle)  from  the  population  of  N  vehicles.  Let's  redefine  the  sample  ratio  mean  to  be  a 
sample  arithmetic  mean.  The  following  chart  depicts  this  redefinition. 

Table  2.  Redefined  Maintenance  Ratio. 
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=  Sample  Arithmetic  Mean 


It  can  be  shown  that  the  sample  ratio  mean  and  the  sample  arithmetic  mean  are  the  same 
quantity.  The  redefined  variable  Adj  MR  only  accounts  for  variation  in  man-hours  and  does  not 
account  for  variation  in  miles.  Correlation  is  also  not  accounted  for  because  the  pairing  for  man¬ 
hours  and  miles  for  each  vehicle  has  been  eliminated. 


The  following  is  an  estimate  of  the  arithmetic  mean  of  the  population  (  AdjMR  )  based  on 
the  sample  ofn  AdjMR’s  [1]: 


Man  -  hours t  In 

MR  — - 

n 

Miles !  /  n 

j-\ 


(2) 


The  estimate  of  the  SE  of  MR  using  this  same  sample  of  n  AdjMR's  is  the  following  [1  ]: 


(3) 


ft 

Ft  A 

'YJ{AdjMR!  -  MR)2 

SE=| 

IP 

2 

n 

MR  is  also  the  estimate  for  MR  [3].  where  MR  is  the  mean  of  man-hours  divided  by  the 

A 

mean  of  miles  based  on  the  population.  Let  SE  serve  as  the  estimate  for  the  SE  for  MR  .  We 

A 

know  that  this  is  not  the  best  SE  estimate  for  ~MR  because  it  doesn’t  fully  account  for  variation  in 
miles.  Nevertheless,  using  these  estimates  in  the  procedures  described  in  the  next  section  leads 
to  excellent  coverage,  non-coverage  and  efficiency  (distributions  and  sizes  of  Cl  lengths)  results 
for  many  ratio  mean  problems. 

Standardized  Z  estimate  for  ratio  mean  &  the  Cl 


It  follows  from  the  bootstrap-t  that  bootstrapping  (based  on  a  random  sample)  can  be 
used  to  obtain  an  estimate  for  the  standardized  Z  distribution[  1  j .  In  statistics,  bootstrapping  is  a 
modem,  computer-intensive  general  purpose  approach  to  statistical  inference,  falling  within  a 
broader  class  of  re-sampling  methods.  A  bootstrap  sample  (BSS)  of  n  vehicles  is  a  random 
sample  one  vehicle  at  a  time  with  replacement  from  the  original  sample  of  n  vehicles.  The 
probability  of  selecting  a  given  vehicle  for  each  of  the  n  random  selections  is  1/n. 


Bootstrapping  has  been  in  existence  for  over  25  years  and  has  facilitated  solutions  of 
various  kinds  of  problems  (confidence  intervals,  variance  reduction,  hypothesis  testing,  etc.). 
This  research  uses  bootstrapping  to  generate  an  approximate  Cl  around  a  ratio  mean.  This  is 
done  by  generating  a  large  number  of  BSS’s  through  a  Monte  Carlo  [4]  simulation  procedure. 


Now,  let's  estimate  a  bootstrap  standardized  Z  distribution  from  the  sample  of  n  vehicles. 
Generate  B  BSS’s  of  size  n  from  the  sample  of  n  vehicles.  Each  of  the  n  elements  of  BSS  i  (i=l 
to  B)  will  contain  all  vehicle  information  (i.e.  man-hours,  miles).  For  each  BSS  compute  the 
mean  and  SE  estimate  the  same  way  that  was  done  for  the  original  raw  sample.  Compute  a 
standardized  Z  value  for  each  BSS  [1]. 


Z(,)=^> — ——  where  MR(i)  is  an  estimate  of  the  mean  for  the  i'h  BSS ; 
SE{I) 


SE(i)  = 


ft  A 

Y}MjMRj(l)- MRt,)]2 


J  = 1 


th 

-  is  an  estimate  of  the  SE  for  the  i  BSS 


MR  is  the  original  sample  mean 


(4) 
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Now  we  use  ( SE  ,  MR .  and  Z  )  estimates  to  compute  an  approximate  2-sided 
100(1-2 a)%  confidence  interval  (a  is  the  desired  area  in  each  tail)  for  the  ratio  mean  metric. 

First  compute  the  a 'h  percentile  and  (\-a)th  percentile  of  the  Z  distribution  and  call  them  ta  and 
tx_a  .  respectively.  It  follows  that  the  100(1—2 a)%  Cl  is  [1]: 

A  A 

Lower  Cl  bound  =  MR- SE*  l ,  a  and 

Upper  Cl  bound  =  MR- SE *  ia  (5) 

The  Z  distribution  accounts  for  the  correlation  of  man-hours  &  miles  and  the  variation  in 
man-hours  and  miles.  The  following  mathematical  argument  creates  the  framework  for 
proceeding  to  the  next  level  of  validation. 


Recall,  SE  was  used  to  serve  as  the  estimate  for  the  SE  for  MR  .  Also  recall  that  we  know 

that  this  is  not  the  best  SE  estimate  for  MR  because  it  doesn’t  fully  account  for  variation  in  miles. 
The  argument  and  hypothesis  are  that  the  product,  SE*t(aor  ,_a)  is  not  compromised  because  the 

lack  of  variation  of  miles  in  SE  is  accounted  for  in  the  Z  distribution  and  ultimately  t{aarX_a). 

This  hypothesis  is  tested  and  validated  by  simulating  the  confidence  interval  properties 
(coverage,  non-coverage  and  efficiency)  for  many  ratio  mean  problems.  It  was  shown  that  the 
validation  results  for  various  ratio  mean  problems  were  excellent  for  many  kinds  of  scenarios. 
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3. 


CONCLUSIONS 


Many  situations  arise  that  require  the  need  to  develop  reliable  approximate  confidence 
intervals  for  ratio  means.  CIM4RM  is  a  tool  that  was  developed  to  satisfy  this  need.  Prior  to 
CIM4RM.  no  documented  tool  existed  that  could  produce  stable  validation  results. 

Coverage  and  its  related  properties  for  CIM4RM  were  tested  with  many  ratio  mean 
problems  and  were  shown  to  perform  very  well  for  v  arious  measures  of  sample  size,  correlation, 
location  and  distribution  mix.  Therefore,  the  CIM4RM  methodology  is  a  reliable  and  stable  tool 
for  building  CTs  around  ratio  means. 

The  only  scenarios  where  coverage  starts  to  deviate  away  from  the  required  level  is  for 
extremely  high  correlation  w  ith  highly  skewed  data,  existence  of  outliers  in  the  data  that  cause 
the  location  parameter  to  shift  or  cases  where  the  sample  sizes  are  extremely  small.  When  the 
two  variables  for  the  ratio  mean  are  highly  correlated,  the  confidence  interval  tends  to  be 
extremely  short. 

The  Army  is  currently  using  this  methodology  to  quantitatively  analyze  maintenance  ratios 
and  other  ratio  mean  performance  metrics  for  fielded  Army  ground  and  aviation  systems.  The 
Office  of  Inspector  General  for  the  Department  of  Health  and  Human  Services  is  utilizing  this 
methodology  for  reporting  ratio  mean  confidence  intervals  to  the  U.S.  Congress.  Some  other 
existing  applications  include:  performance  evaluations  for  Army  test  systems,  evaluations  of  an 
Improvised  Explosive  Device  Detection  demonstration,  hypothesis  testing  development  for  many 
applications  that  compare  two  ratio  means.  Aging  Effects  hypothesis  testing,  and  paired 
reliability  hypothesis  testing. 

Although  current  applications  are  government  centric,  there  are  countless  other  areas  in 
private  industry  (e.g.  banking,  automotive)  where  CIM4RM  can  be  used  for  improving  decision 
analysis. 
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